Author
|
Topic: Scoring Methods
|
Polyscoring Member
|
posted 02-22-2010 08:15 AM
Hello all,i see that many people use Don Krapohl`s Defensible Dozen and other interesting scoring methods. I also see schools teaching the DACA Scoring method from DACA Manual 503 (2006) because it is standarized and examiners can refer to it, but it is also on Antipolygraph`s webpage. Is there an acceptable method for scoring charts that is practically UNIVERSAL? thanks for your thoughts! IP: Logged |
Barry C Member
|
posted 02-22-2010 09:42 AM
I'm sure Don would be flattered that you consider them his, but the scoring features he lists in that dozen or so were validated by many studies before many of us came along. What you're seeing with the scoring systems (such as DACA) is that they are all getting in line with the science. Essentially, the scoring features are the same for DACA and Utah, and they are what are in ASTM's standards. There are some minor differences, but nothing all that noteworthy. (Backster also uses pretty much the same features. He just has a lot of rules built around what one does with them when scoring.)The differences are in how to score those features. For example, Utah requires a 2:1 ratio to score an EDA reaction, and DACA allows for less based on the "bigger is better rule." Both are still looking for the greater amplitude, so they are looking at the same criteria listed in that dozen or so Don lists. There is also now the simplified scoring system (which you can read about in the current Journal of Polygraph). It too uses the same features, but we have the ability to do some things with the data that weren't possible before. So, yes, there are universal criteria, but what you do with them is not. It's hard to have a universal system as that implies a one size fits all system, and that won't work. What cut-offs to use, for example, is a policy decision. Do you want to catch more liars or have balances accuracies? The answer to that question determines what you'll have to do. IP: Logged |
Polyscoring Member
|
posted 02-23-2010 11:13 AM
Thank you Barry,the situation is that i work with federal agencies outside the United Staes that all went to different polygraph schools. They seem to be rather competitive with one another and look to us for the answer.. which is complicated. do you happen to have a copy of the Journal "digital"? i am currently outside the states.. thanks IP: Logged |
rnelson Member
|
posted 02-23-2010 03:37 PM
I will take Barry's statements even further. People who function as rule-bound procedurists will undoubtedly be handicapped by the apparent (minor) differences between the “defensible dozen,” Utah features, ASTM features, and DACA features. Those differences are so minor that it is unlikely that anyone will ever find a statistically significant difference in their performance. Instead it is more likely that these features will perform essentially equivalently. What may make a little difference is the rule-sets by which we make comparisons and transformations of the matrix of several presentations of CQs and RQs – using arbitrary ratios that violate our existing knowledge about non-linear physiology, or a simple but robust non-parametric bigger-is-better rule that makes no assumptions about the linearity or shape of the underlying physiological data. Barry is right about cutscores affecting sensitivity to deception and specificity to truthfulness. However, the best way to set cutscores is through the evaluation of statistical estimates of the rates of error (statistical significance) that is achieved by various cutscores – not through some blind adjustment.Keep in mind that polygraph ain't rocket science... so anything that is available in print is probably not so esoteric that it could not be universally understood by reasonably intelligent persons. "Universal" acceptance is not likely until we can answer the most difficult questions that our smartest scientific minded opponents would ask us - if, and when, they get the chance (which will most likely be in court). One of those questions will be this: "what is the level of statistical significance that is achieve by our reported 'significant reaction' when we manual score the test?" This question goes directly to a core issue in Daubert requirements. Computer algorithms can answer this question easily. Until now, no-one could really satisfactorily answer these questions with manual scores, especially with screening exams, but we are working on it... There is an article in the recent polygraph journal describing a validation study on an empirically based approach to manual scoring - which can answer that difficult question. In addition, we now have Monte Carlo validation experiments, including accuracy estimates with statistical confidence intervals, along with Monte Carlo norms for mixed issue screening exams - which is only the beginning of something that some thought would never happen. What I have found is that some examiners don't even understand the questions (statistical significance), the importance of the question, and why it is fundamental to any assertion that the polygraph should be more widely accepted as a scientific test. However, our scientific minded opponents absolutely understand these questions and their importance - and our opponents will continue to recruit experts like Zellicof to attack us, as they have done in New Mexico. I suspect that there are some examiners who harbor secret insecurities that they are not capable of understanding polygraph science, and may even doubt that polygraph can meet the rigorous requirements of science. Even more concerning is the childlike belief that we have already learned all the answers we will ever need – and the corresponding apathy about the need to continue to study, learn, and incorporate NEW information and new knowledge into the matrix of our present field polygraph methods. What is universal is that ALL scoring systems, whether manual or automated, will include some common elements, including: 1) features within the recorded physiological data, which can be observed and measured/scored as reactions to the test stimuli; 2) transformation methods which combine the observed/measured features, within several iterations if the test stimuli, into a single numerical index for the exam as a whole, and perhaps for individual investigation targets; 3) decision policies which structure how to sort the various combinations of total and question index scores into the dichotomous categories of positive and negative (not regarding inconclusive results), which are then interpreted as indicative of truthfulness or deception; and 4) normative data, in the form of parameter estimates, that describes the location and shape (mean and variance) of the distributions of truthful and deceptive scores which can be used to calculate the level of statistical significance (probability of error) when classifying a single examination as either truthful or deceptive. The problem (which will be ongoing) is that we do not yet know everything we need to know. Again, this will irritate a few rule-bound individuals who cannot tolerate the anxiety of ambiguity that results from the need to think for oneself and make some choice (vs. simply doing whatever you are told to do) – and their impulse will be to try to “decide” on a “correct” way, even in the absence of knowledge to guide the best decision. The problem is that pretending to know the answers to questions for which we do not really have the answer (from research) will ultimately make us DUMBER in the end. This is because we will neglect to seek the actual answer, and will be reluctant to ever admit that we have not been doing that which would be best. So, we actually want some dialectic and competitive tension – because it prompts us to continue to learn the answers to our important questions, instead of pretending we already know (we don't). Pretend like we know the answers, and learning will stagnate for a long long time. What we have now is nearly 30 years of research, including different groups of researchers at different universities (can you say “convergent validity”), that tell us that us field examiners should be paying attention to about 12 different features. We also now have a replication, mentioned by Barry, of a manual scoring experiment in which inexperienced examiner's scored a sample of cases with accuracy and inter-rater reliability that was as good or better than experienced examiners. Think about that: most studies attempt to test a scoring model under best-case situations. The proper way to investigate a method is to test it under adverse or worst-case scenarios. This is not to suggest that experience does not matter – it does. What the results do suggest is that an empirically based scoring method can be taught and learned with good results. Polyscoring, your email address is not listed. Send me your email address and I can send you a scan of the study that Barry mentioned. raymond.nelson@gmail.com .02 ------------------ "Gentlemen, you can't fight in here. This is the war room." --(Stanley Kubrick/Peter Sellers - Dr. Strangelove, 1964)
IP: Logged |
Ted Todd Member
|
posted 02-23-2010 03:58 PM
Ray,Can you CC me the study as well? THX Ted ted.todd@comcast.net IP: Logged |
rnelson Member
|
posted 02-24-2010 06:59 AM
Ted, your email bounced. Is there a better address?r IP: Logged |
skar Member
|
posted 08-03-2010 11:08 AM
My polygraph has not so good computer algorithm, because of that I do not use this function.I do not know much about computer algorithms. I have read, for example, about OSS3 here http://www.oss3.info/. I saw there different formulas, but I do not know how to use them and I have questions. What are the best computer algorithms nowadays for CQT (Utah ZCT, Federal ZCT, AFMGQT ...), R/I and CIT tests? Is it possible to use modern computer algorithms without the software of a polygraph or it is difficult or commercial secret? For example, is it possible to put in Excel some formulas of this computer algorithms, pneumo, EDA, cardio values (RLL, amplitudes..., I can get this values) and to calculate a probability? If it is possible, where can I find information how to do this? Thank you. [This message has been edited by skar (edited 08-03-2010).] IP: Logged |
Barry C Member
|
posted 08-04-2010 10:41 AM
All of the algorithms work about the same (accuracy -wise). You can use OSS-3 on any type of CQT. Ray Nelson had made it available in a spreadsheet, so if you can export the Kircher feature measurements from your polygraph software (some will allow you to do that), it will work for you. It also computed the OSS1 or 2 scores depending on the type of CQT used.The math behind OSS 3 is difficult to follow, and over the heads of many. The CPS algorithm is based on a discriminant analysis and some Bayesian computations, and it's easier to follow, but it's still a cure for insomnia unless you really like that aspect of things. IP: Logged |
rnelson Member
|
posted 08-07-2010 09:07 AM
Barry,I agree with you mostly. However, if you look at the details, both OSS-3 and the CPS algorithm use Discriminate Analysis to calculate the weighting function for the combination of the pneumo, EDA and cardio. OSS-3 and CPS use the same physiological features, which have shown up in the scientific studies on feature extraction for 20-30 years now. However, OSS-3 uses weighted averaging, and CPS uses a classical discriminate function. Both OSS-3 and CPS standardized z-scores. But CPS uses a z-score of the raw measurements, and OSS-3 uses a z-score of the logged R/C ratios. CPS averages all the data first for all components, between charts, and then aggregates the component scores via the discriminate function. OSS-3 first uses weighted averaging within each chart, for each presentation of each target stimulus, and then aggregates the data for each stimulus between charts. OSS-3 may seem more complicated, but it is done for us by the computer. The advantage of the OSS-3 method is that it is theoretically more robust against between-chart differences in sensitivity or amplitude/response level. Also, the OSS-3 numbers are very easy for field examiners to understand because the standard z-scores act like 7-position scores with decimal precision. One could, using the published information, construct a spreadsheet to calculate either OSS-3 or CPS scores. I have replicated the CPS algorithm in Excel from the published information. It did require that I train or develop my own discriminate function, using our own normative development sample data. CPS developers have published descriptions of the method, but regard their discriminate function as intellectual property. OSS-3 is a completely open source model. If anyone wants it, all of the information is available. OSS-3 is available now in most computerized polygraph system - which makes sense to do because it is comprehensive, completely documented and accountable, and now belongs to the entire polygraph profession because of the open-source and free development approach. The only caveats at the present time are that Stoelting does not yet have it available, and that the Axciton implementation of OSS-3 appears to be sorely incomplete to the point of un-usability - and the numerical results provided by the Axciton OSS-3 are of unknown accuracy. It would also be possible to calculate OSS-3 or CPS scores by hand - with a paper and pencil, or handheld calculator. But you would have to develop your own discriminate function for CPS using your own sample data. As you point out, that would cure insomnia. It would also require someone to have a much greater understanding of how the math and probability functions works. So why bother? It is all done for you by the computer software. It would be a good idea to gain a general understanding of the process - and many field examiners are able to understand what the algorithm does with the data. Skar is correct to be interested in a probability value - because that is the answer to the questions of science - what is the level of statistical significance or probability of error associated with the test result? At the present time, some of the available commercial algorithms cannot even answer this question - because they lack a decision model, or they lack adequate basis in statistical decision theory. So these will never impress scientific thinkers. Also, most of our manual scoring methods cannot answer this question for either single-issue (including multi-facet) exams or multiple issue screening exams. Error rates - which are not the same as p-values and do not provide the level of statistical significance - have typically been calculated and reported only in terms of frequencies (counts and percentages) with can be used to calculate Bayesian accuracy estimates with are non-robust against unknown differences in field base-rate conditions and therefore cannot be reasonably or reliably generalized to field use. This same problem applies to all of the algorithms that do not or cannot calculate a p-value or level of statistical significance. The ESS is, I believe, the only scoring method which can provide the level of statistical significance (p-value or probability of error) for a manual score. The ESS is also surprisingly simple, and every aspect of it is based on evidence from scientific studies (as opposed to un-studied theorizing). ESS can provide a p-value (level of statistical significance or probability of error) for any type of exam, including single-issue, multi-facet, and mixed-issue exams. ESS can do this because we took the time to do the research and study the normative data using sophisticated computerized Monte-Carlo methods. Now that it is done, we have a simple and accurate manual scoring procedure, and a probability value that is robust against unknown changes in field base-rate conditions. What would be best is to have both manual and computer algorithm scores that agree. Most of the time they will agree. Sometimes they will not. If we have permission to access and understand the knowledge and science (open-source) upon which the result is based then we can often understand and determine - and learn from - the cause of an occasional disagreement between computer and manual scores. If we don't have access to the knowledge, then we are in the same position of a bunch of Neanderthals inventing mythology to explain things like the weather and fertility. I suggest we insist on access to the knowledge. Of course, nothing is perfect and there will always be occasional disagreements between computer and manual scores. If we make fictitious claims of perfection, then the smart scientific minded folks will eventually observe some imperfection and pull our pants down in public. Everything in science is a probability, and probability theory implies the probability of random events and errors. Our job - if we want to claim the polygraph is a scientific test and not just a form of art or sorcery - is to have a method to calculate and report the probability of error for our test results. Some few random errors will always occur. We will sound silly and child-like if we feel compelled to have an explanation for them. It is interesting, but not necessary, to explain everything, and often puts us in the mind-reading and mythology business to try too hard to explain things for which we don't completely understand their causes. We simply calculate the probability of error based on our normative data. Errors, in science are due to something called "uncontrolled variance" - which is geek-speak for "sh$% happens." r
------------------ "Gentlemen, you can't fight in here. This is the war room." --(Stanley Kubrick/Peter Sellers - Dr. Strangelove, 1964)
IP: Logged |
Barry C Member
|
posted 08-07-2010 10:30 AM
It was the logged R/C ratios I was trying to avoid. I can still see the dizzied faces from the seminar....This is all important stuff, and it's worth reading Ray's response a couple times if that's what it takes. If you haven't seen this little report, then you need to: http://www.ascld.org/files/releases/NAS%20Executive%20Summary%20090218.pdf If that doesn't work, Google this: Strengthening Forensic Science in the United States: A Path Forward and look for the executive summary. The days of just being a polygraph "operator" or technician are soon to be over. If we don't catch up with the rest of the forensic science community (and we are ahead of some) in efforts to conform to the changes that are coming, then we'll soon be a discipline of the past. This "science" stuff, including science based standards and individual certification of competency, is coming whether we like it or not. The wheels are already turning at the federal (congressional) level, and we have to move ahead or be relegated to junk science - which is where some want us already. You'll note that if you search that document, there is no mention of polygraph at all, and some believe that is by design. Just so you know, the APA and AAPP are working together on this. IP: Logged |
rcgilford Member
|
posted 08-08-2010 05:45 AM
"...the Axciton implementation of OSS-3 appears to be sorely incomplete to the point of un-usability - and the numerical results provided by the Axciton OSS-3 are of unknown accuracy."Ray, Should we not use OSS 3 with Axciton?
IP: Logged |
rnelson Member
|
posted 08-08-2010 10:49 AM
RC,At this point, I do not see how you could make much use of the OSS-3 in the Axciton system. It is incomplete to the point of unrecognizability. Results, as far as I can tell, are shown in a tiny on-screen dialog box for which you have the single option of clicking OK to make it go away. No printing. No additional information. The numerical result is a single number of unknown definition, along with an interpreted result. I do not know what the number is. It could be a logged R/C ratio, or it could be a standardized logged R/C ratio - but their is no effort to provide us with any useful information about it. There are is no p-value, and the alpha is unknown. There are no options for decision policies for different types of exams (ie., single-issue, multi-facet, multi-issue) - and these are very different statistical and decision theoretic problems. There are no measurements to observe, no way to ensure that artifacted data are not score, and no way to audit what was done with the many settings and options included in OSS-3. All of these features are central to OSS-3, and are available in the fully functional Excel model which we made available. Lack of features, lack of information, lack of access, and lack of empowerment is a good example of exactly why I wanted to build an disseminate an open-source algorithm in the first place. Lack of access, lack of empowerment, lack of information, and lack of features serves only to treat us field examiners like step-children who are too dumb to manage and understand our own work. It keeps us locked in the dark to the point where we barely understand the scientific and statistical principles that make the polygraph work, and keeps us in the position of looking and sounding like outsiders among conversations in the broader fields of forensic science and testing. No wonder we are afraid of algorithms - we have, in the past, been kept nearly completely in the dark about what they do and what they are capable of doing. Proprietary interests have seemed, at times, to interfere with the development of generalized knowledge and progress in our profession. We have, I believe, a responsibility to be good stewards of our profession - and to give our profession every opportunity to progress and improve in a manner that allows us to be taken seriously and recognized by related fields of science. Instrument manufacturers have a natural role in this, as the instruments we use will, in some ways, shape and define our understanding of our work and our science. OSS-3 is built on the premise of empowerment and information for the field examiner. The spreadsheet model which we made available is a completely functional example of every feature we designed, developed, and validated. A few years ago, we made it available and offered any technical assistance to all of the instrument manufactures. I have no idea why anyone would include OSS-3 in such an unsatisfactory way. Why bother implementing it at all? Of course, OSS-3 is still available as an open source project, and all of the information on OSS-3 is still available. A few people are using the OSS-3 Excel spreadsheet. r .02
------------------ "Gentlemen, you can't fight in here. This is the war room." --(Stanley Kubrick/Peter Sellers - Dr. Strangelove, 1964)
IP: Logged |
Bill2E Member
|
posted 08-08-2010 11:38 PM
What happened to an examiner hand scoring his charts, then checking the algorithms in OSS-3 and other computer generated scoring? I totally believe you should hand score every chart and after having completed your work, check the computer scores. If there is a significant difference that would alter the call you made by hand scoring, start looking for the reason. We also have QC which backs up the examiners hand score. IP: Logged |
rnelson Member
|
posted 08-09-2010 11:01 AM
Good points Bill.A couple of things to think about. We have growing piles of research to work with on this. One of the things about research is that it's a little like target shooting - sometimes the bullets don't all go through the same holes, but if you poke enough holes in the target you'll start to see the pattern. It is the pattern that matters most, not the individual holes. And sometimes there is a wild shot. What we want to do is understand why those wild shot occur, learn how to reduce the level of variance in the pattern, and tighten the group - and at the same time not get so obsessive or concrete in our thinking that we simply make up an answer, just to have one right now, when we may not yet be able to learn or know all the reasons that everything happens. That said, the patterns you'll see in the published studies on scoring seem to tell us some basic trends. First, original examiners have tended to outperform blind scorers. This may bother scientific folks who want to know why, but to field examiners it doesn't matter. What matters is that we probably should not aggressively second-guess the original examiner unless an obvious error has occurred. By "obvious error" I mean the kind of error that violates a principle of science, not simply violates an arcane procedural rule. We have sometimes confused the concept of validity (which is a scientific concern) with compliance (which is an administrative concern). Second, well-developed computer algorithms have tended to perform as well or better than many human scorers. Sure, a few exceptional human scorers have outperformed the algorithms at times. The NAS (like 'em or not they are not dumb) told us and Congress that the median accuracy of the polygraph seems to be near about 86 to 89%. That is the median -which means that half of the reviewed studies were less accurate than that. As Barry has pointed out in the past, half of all people actually fall below the average score. So, computer algorithms are possibly more accurate at scoring exams than most examiners. Think about it - only a few people might outperform the algorithms, and the algorithms might more accurate than the average examiner. Now consider the fact that there is no credible profession in which the professionals are afraid to use computers and fancy math - and we would never trust engineer or scientist who refused to use a computer. When people ask us - and they will - whether we used a computer, what will our answer be? "No?" "Why not?" Is it because we are afraid we are smart enough to understand them? How professional and scientific is that? Now consider that most, if not nearly all, of our manual scoring methods continue to emphasize procedural rules as a stand-in for scientific validity. Some of our rules and assumptions are not actually supported by scientific studies (time bars, symptomatics), and some of them will sound sound kind of silly to scientists outside of polygraph (exception to homeostasis - which has little if anything to do with homeostasis). Anyway, I agree that you should manually score your charts, then reconcile and understand any disagreement. We can do this rather easily when we begin to understand what the computer does with the data. So, getting back to Polyscoring's original question - about a scoring method that is practically universal. I don't think we actually want that yet or any time soon. To suggest a universal scoring method is to placate to the simplistic demands of simple-minded folks who are agitated by ambiguity and unknowns - and who prefer to pretend we have all the answers when we do not. If we said there was a universal and correct way, we would encourage all the naive impulses to force everyone in the profession to play a child's game of follow-the-leader - because the leader is always right about everything. The real result is that we would then fail to learn anything new for the next several decades and we would be perceived by our neighboring scientific professions as falling even further behind on our own learning curve. What we need is to foster a professional system of values and attitudes that helps us to accurately evaluate the value and usefulness of various solutions, and emphasizes our need to follow the data - and refrain from childhood games like follow-the-leader. A professional and scientific approach can tolerate the ambiguity and uncertainty of not knowing everything, and still appreciates that we know a lot and enough right now to know what is the best thing to do (follow the data). A problem with most manual scoring systems is that they cannot answer the difficult questions of science - what is the level of significance of the test result? And how do we calculate the probability of error for a mixed issue screening exam? Our solution in the past seems to have been to ignore this question, and we are now feeling threatened with the potential consequences for that kind of avoidance. There are answers to these questions, but it does require our curiousity and capacity to learn. Continuing to do and teach solutions that were satisfactory 10 or 20 or 40 years ago is not going to move us forward any more than it would to build our new cars the way we did in decades past. We need answers to the questions of science. But first, we need to understand and appreciate those questions of science - which is why we need to know both how and why our manual and computer scoring methods work. .02 r
------------------ "Gentlemen, you can't fight in here. This is the war room." --(Stanley Kubrick/Peter Sellers - Dr. Strangelove, 1964)
IP: Logged |
arewethereyet Member
|
posted 08-09-2010 06:14 PM
In response to Dr. Nelson's comments about Axciton and OSS-3; Axciton respects the good and capable Dr. Nelson, as well as his skilled contributions to the polygraph research community. The other various people who have contributed to the OSS-3 approach to an algorithm are worthy of appreciation and recognition for their good efforts as well. We hope that over time oss-3 will continue to evolve and improve in more robust and adaptive ways. At the request of some of our users, we include oss-3 as one of the 6 or 7 Axciton algorithms an examiner may choose to use. We have implemented oos-3 as best as its excel, and other papers, describe its structure, and we respect the good Dr. Nelson's opinion that there should be a larger, more detailed, output screen. In the future, we at Axciton may expand the oss-3 output screen with more detail if our customers request it. Now, with all due respect to the good people who have worked on OSS-3, I will describe where I think this algorithm is deserving of constructive criticism... 1. The heavy use of 'bootstrapping' of a small population of good exams into a very large fictitious population effectively creates a false reality devoid of ugly messy sessions that really make up reality. Things like countermeasures, purposeful non-compliance, pacemakers, medical problems, etc., are omitted from such bootstrapped illusory populations. This bootstrapping approach is similar to cherry picking any difficult polygraph cases out of a more true population. 2. In principle, more complex rule sets, such as DACA's 3 rule hand score type methodology, contain more potential accuracy than simple one-dimensional rules such as 'total' which oss-3 uses. 3. Lastly, the excel spreadsheet presented to us describing oss-3's rules appeared to have over 20 major conditional if-then-else conditions. This number of conditional rules seemed especially large for a small 'bootstrapped' population. In summary, Axciton respects the good effort made by the skilled people who developed the oss-3, and we continue to encourage and support them in their continuing research efforts.IP: Logged |
Barry C Member
|
posted 08-12-2010 08:16 AM
quote: 1. The heavy use of 'bootstrapping' of a small population of good exams into a very large fictitious population effectively creates a false reality devoid of ugly messy sessions that really make up reality. Things like countermeasures, purposeful non-compliance, pacemakers, medical problems, etc., are omitted from such bootstrapped illusory populations. This bootstrapping approach is similar to cherry picking any difficult polygraph cases out of a more true population.
I think this is a bit oversimplified and misleading. There is a problem making generalizations from any sample. That's the nature of the beast. However, OSS3 is designed only to score tests you the examiner would score. If you had a PNC, CM, etc, test, you likely wouldn't score it either, so why would you expect the algorithm to do so? The algorithm wasn't designed to score bad data. (It is interesting, however, that the CPS algorithm was trained on "good" data (though those researchers accept as "good" a lot uglier data than I would put up with), and in some of the CM studies the CMs were ineffective against the algorithm even when the CMs caused problems for the hand-scorers.) quote: 2. In principle, more complex rule sets, such as DACA's 3 rule hand score type methodology, contain more potential accuracy than simple one-dimensional rules such as 'total' which oss-3 uses.
Could you expand on this one? I don't follow you. quote: 3. Lastly, the excel spreadsheet presented to us describing oss-3's rules appeared to have over 20 major conditional if-then-else conditions. This number of conditional rules seemed especially large for a small 'bootstrapped' population.
So, if I understand this correctly, you bury all that behind a single number nobody can understand? That's not a solution. That makes for a bigger problem than you believe to exist in the first place. If you see it as problematic, then why don't you make sure it's disclosed in the results window?
IP: Logged |
skipwebb Member
|
posted 08-12-2010 12:26 PM
The OSS3 algorithm can be used with little difficulty using the "Extract" program. One need only run the Extract program to isolate the raw data from the Axciton charts and then copy and paste that numeric data into the OSS3 spreadsheet.One must use caution, however and insure that data from tracings that contain artifacts be removed much the same way that we do in hand scoring. "Extract" allows the user to effectively "zero" out bad data selectively by component and by question before the numbers are entered into OSS3. As I understand it from Ray, once the numbers from "Extract" are plugged into the spreadsheet, the manner in which OSS3 utilizes that data is no different from the other software vendor's data. Ray can elaborate on this with much more clarity than I can muster. I once naively asked Dr. Honts in an email if using three position scoring rather than 7 position scoring would have an adverse effect on efficacy of the Utah testing format. His answer was quite telling. To paraphrase him as I no longer have the email..."well it won't be any worse than what you are doing now." In a perfect world in which all examiners carefully removed from consideration, all adversely effected data that was "tainted" by artifacts, I personally believe that the computer algorithms would "outscore" humans hands down. Unfortunately, the nature of the computer is to push the "go" button and accept the results as the gospel according to Bill Gates. Diagnosing polygraph data epitomizes the old adage "Garbage in = Garbage out" We can't expect any algorithm to perform well if the data is not vigorously "scrubbed" prior to use. Think of it this way. How well would your car function if you put gasoline in the tank that contained water and sugar? IP: Logged |
skar Member
|
posted 08-19-2010 08:59 AM
Raymond Nelson, if your spreadsheet of OSS is available for free I will appreciate you send it to me or tell me how can I get it.Also, tell me please where can I read about Kircher features for OSS and what are the best computer algorithms for R/I and CIT tests? Thanks. skarpost@gmail.com
[This message has been edited by skar (edited 08-19-2010).] IP: Logged |
Barry C Member
|
posted 08-19-2010 07:53 PM
Reminder: OSS-3 info is available here: http://www.oss3.info/ IP: Logged |
Bill2E Member
|
posted 08-20-2010 11:43 AM
Barry Does OSS3 work with stoelting or only with lafayette IP: Logged |
Barry C Member
|
posted 08-20-2010 08:28 PM
Lafayette and Limestone have it in their software. Stoelting doesn't yet, but it will be in CPS-Pro, last I knew. You can use it with Stoelting now though. You just need to paste the Kircher measurements into the OSS-3 spreadsheet, which you can get from Ray.IP: Logged |
rnelson Member
|
posted 08-24-2010 11:06 AM
arewethereyet: quote: In response to Dr. Nelson's comments
I will describe where I think this algorithm is deserving of constructive criticism... 1. The heavy use of 'bootstrapping' of a small population of good exams into a very large fictitious population effectively creates a false reality devoid of ugly messy sessions that really make up reality. Things like countermeasures, purposeful non-compliance, pacemakers, medical problems, etc., are omitted from such bootstrapped illusory populations. This bootstrapping approach is similar to cherry picking any difficult polygraph cases out of a more true population. 2. In principle, more complex rule sets, such as DACA's 3 rule hand score type methodology, contain more potential accuracy than simple one-dimensional rules such as 'total' which oss-3 uses. 3. Lastly, the excel spreadsheet presented to us describing oss-3's rules appeared to have over 20 major conditional if-then-else conditions. This number of conditional rules seemed especially large for a small 'bootstrapped' population. In summary, Axciton respects the good effort made by the skilled people who developed the oss-3, and we continue to encourage and support them in their continuing research efforts.
There are so many errors of fact in this that I don't know where to begin. First, you address me as Dr. Nelson three times, which is both inaccurate and excessive. Yet you do not identify yourself clearly. Some researchers consider 300 cases to be a large sample. Of course there are many studies of much larger scale (e.g., the General Social Survey) that make this look small. The GSS, for example has over 50,000 respondents and over 5000 variable. But was conducted every year for nearly 20 years beginning in 1972, and every other year since 1994. Does anyone have access to very large samples of confirmed polygraph case data? The Axciton cases that I have seen shown at polygraph conferences seem to me to have the same case numbers as the cases in the DACA confirmed case archive. Keep in mind that one of the first things unprepared critics will tend to do is to attack the sample when they don't like study project or its results. So, I will kindly ask that we engage a realistic discussion of this matter. What matters more than the size of the sample – once a sample has sufficient size to avoid granularity or step-wise effects due to frequency counts and small integers – is whether the sample is representative. Larger samples, like the GSS, have the advantage of having stratified random sampling opportunities in which some data can be collected from different cities or regions of the country. All samples are imperfect representations of the population – in science we say they are “biased.” Every graduate student and research trainee knows this. Biased, in science, means only that they are imperfect. There are four major approaches to managing sample bias: 1) larger sampling distributions that can randomly capture more representative information, 2) construct a distribution of many smaller sampling distributions – as is often done with IQ and personality tests, 3) conduct non-random or matched, stratified sampling – which sounds odd but is used very effectively in market research and political polling, and 4) computer based bootstrapping and monte-carlo models that allow us to make maximum use of available data, use the laws of large-numbers to reduce the biasing influence of outlier or non-normal cases on normative parameter estimates, and allow us to calculate statistics that would be exhaustively difficult to do by hand. Someone please tell me where, in the sciences, is it unacceptable, or not recommended, to use computers to study and solve complex mathematical and measurement problems. A fifth approach is more or less atheoretical. It is to not care, and simply assume the data are representative in some way, and to use machine learning techniques – sometimes described as data-mining or neural networking. These work OK, but you never quite know how they arrived at their results. You use them when you don't need to know, and when you have massive amounts of data and massive computing power to mine it (Google). What you refer to as “heavy” use of bootstrapping is not heavy. There were two bootstrap training stages for the OSS-3. Each was 10,000 iterations of the sample of nearly 300 cases. Bootstrapping with 30,000 or more iterations is quite common – look at the Zellicof criticism of polygraph at the Anti site and you'll see he used 30,000 iterations of his Monte Carlo. Keep in mind that I am conducted all this “heavy” bootstrapping on a consumer grade notebook computer – and I now use a tiny netbook. Wow! The first bootstrap operation was to develop norms for the components (Logged R/Cmean ratios) for the individual components (pneumo, EDA, cardio) for all cases (truthful and deceptive combined). truthful and deceptive cases combined. These norms are used to standardize all scores from the Standardizing these scores sets the mean or average component score at zero for all components. We can the determine which have scores that are of positive or truthful valance and which are negative. We can also average them together without violating the mathematical requirements for the correct and responsible averaging – that all data have the same variance. (Remember: standard scores all have a mean of 0 and standard deviation of 1.) The algorithm could have been built without this bootstrap stage, and the data could be standardized using the raw mean and variance values from the sample. Which brings us to another of your inaccuracies. Bootstrapping is NOT cherry-picking good cases. Like all statistical methods, it assumes the data are representative. If you believe the normative data are not representative, then you should simply not use the algorithm. We make no pretense about our data being representative of special populations such as persons with severe medical or psychiatric conditions. Our first challenge is to construct and validate an algorithm that works with normal persons, not special populations. If anyone claims to have an algorithm that is normed and validated on special populations such as persons with severe medical or psychiatric conditions – they should document it, publish it, and show us the mathematical procedures, development and validation experiments, and normative data. The medical and psychiatric communities have the same challenge we have – get their tests to work with NORMAL cases first, and then evaluate the test with exceptional persons. An alternative to the normal approach will be to develop a test around the special population. The Millon Clinical Multiaxial Inventory – III (MCMI-III) is a good example of this. While the MMPI-2 was developed on normal controls, the MCMI-III was developed on criminal and forensic patients. The MCMI-III is, in some ways, more effective than the MMPI-2 at mapping the psychopathology of persons in criminal and forensic settings. But because the MCMI-III norms are criminal and forensic norms, it is considered inappropriate to use this this test with non-criminal non-forensic persons. Nor non-forensic persons the MMPI-2 is better – because the MCMI-III will tend to pathologize everyone rather badly. Anyway, the second bootstrap operation, also 10,000 iterations, was used to develop separate normative parameters (mean and standard deviations) for the confirmed truthful and confirmed deceptive cases – regardless of the scored algorithm result. These norms are used to calculate the level of statistical significance and probability of error for each individual case. Again this “heavy” bootstrapping can now be done on a tiny netbook computer. And again, the OSS-3 algorithm could be normed without bootstrapping, using the raw sample norms. I believe it is better to use the bootstrap norms, because bootstrapping is a conceptually simple, though computer based, method of calculating normative parameter estimates that are less biased than those that are achieve from the raw data. This is because the influence of outlier and exceptional cases is reduced when calculating average values for mean and standard deviants. (Averaging is, of course, non-resistant to outlier values, and no research will ever be conducted without averaging). “Less biased” means “more generalizable” to you and me in field polygraph settings. Keep in mind that what I believe is not what matter. What matters is what the evidence says – stay tuned for that. It would be great if all algorithm developers made their normative data and development data completely transparent and available for this type of discussion. We would all learn something, and the rate of knowledge acquisition in the polygraph profession would rapidly accelerate to match that of our neighboring scientific professions. Regarding your second criticism, on decision rules and complexity – I have never seen any documentation on a “DACA 3 rule handscoring type methodology.” Our studies have rather consistently replicated the findings of Senter and Dollins at DACA. For OSS-3 we selected the optimal solution based on the evidence we have, and that evidence is available. Please show us some documentation so we can have a real discussion about decision rules. If you want to see the general trend in the data, just read the studies by Senter, and Dollins. Then notice that while the Utah techniques have rather consistently slightly outperformed other techniques, they have done this while employing the simplistic grand-total decision rule. In science, the more complicated and fancy an idea is, then the more it requires proof in the form of evidence and data, and cannot depend on a theoretical statement of “in principle.” Theoretical statements are simply hypotheseis that much be proved or discarded. Where is the proof? And regarding your third criticism that the spreadsheet model has too many nested conditional IF statements for a model built on a small sample. Two thing: First, the size of the sample has NOTHING to do with the construction of conditional statements. Second: the OSS-3 spreadsheet was started in a free and open-source spreadsheet application (OpenOffice.org), which can read and write Excel spreadsheets. The development model was then moved to and completed in Micro$oft Excel 97 so that it could ensure compatibility with more computer systems. That is Excel 97 not 2007. Excel 97 allows a maximum of six (6) nested IF statements. Not 20. Excel 97 will not recalculate, and will warn the user of a calculation error instead, if you attempt to nest more than 6 conditional statements. One of the great things about computer programming languages, compared to spreadsheets, is that they are designed to accomplish all these complicated things much more easily – if you know how to use the programming language. A spreadsheet application for a development or lab model has the advantage of being accessible to many more people than a programming language. And our spreadsheets could evidently accomplish our goals as recently as 13 years ago (1997), though our computers were a little slower back then. It appears that you have simply made up something inaccurate to sling as a criticism. Keep in mind that spreadsheet calculations are shotgun type - all cells and calculations are performed every time the worksheet recalculates, which is every time anything is done. Expensive in terms of computing cycles. Error handling is not handled by the environment. All calculation errors must be trapped with each individual formulae within each spreadsheet cell. When you consider the range of powerful options and features included in OSS-3, that does get to be some complicated logic. Other instrument manufacturers have found great solutions to the challenge of implementing OSS-3 and providing field examiners with its powerful features and information. Axciton could too – if it has the the same capabilities and the same motivation. The OSS-3 spreadsheet model was designed to be fully functional allow many options and to recognize question rotation, repetition, omission, and some degree of "creative" error in the field. It is includes translates for Spanish, French, and German, and is designed to handle both individual cases, and large samples of cases. The spreadsheet manages input and errors, and contained formated printable reports that are rich in information for the field examiner program manager, and QC reviewer. I will, as time allows, retrain the algorithm without the bootstrapped norms, so that people can see for themselves whether they like the bootstrap solution. The real test of effectiveness is, of course, whether the algorithm seems to generalize to field settings – whether it correlates with our experience, expert senses, and manual scores. There is no such thing as a perfect algorithm in polygraph or any other field. There are many many compromises to consider, and there will always be some degree of error. Our challenge is to develop and validate – and publish – a method to calculate the likelihood or probability of error of a polygraph test result. (that is, if we want to be taken seriously by the courts and sciences.) Without accountable science, based in measurement, math, and published methods that are common to the profession and cross-platform – then we are just having a professional playground discussion of “oh-yeah, says-you,” and hoping that the courts and sciences, and our consumers will buy it. Without published evidence, that is based in measurement, math, and data, all we have is a well-dressed version of “because I said so.” Or we could just take the low-road of “baffle-them-with-BS,” and sling around useless and confusing jargon and that cannot be realistically evaluated by other scientists. Even if the jargon is accurate, it will not advance the profession if normal professionals cannot take it home to study it and learn it. Scientists will not endorse anything they cannot replicate. Polygraph should be a contest of data, evidence, and science, not a bully-contest or beauty-contest of resumes and professional CVs. One of the great things about an open-source and accountable algorithm is that people are free to actually study it and make actually criticisms or suggestions about ways to optimize it or improve it. Certainly a discussion about the representativeness of a sample is always appropriate, but it is the effective generalization to field settings and validation with other samples that settles the argument and discussion about the representativeness of sample based norms. Criticizing the use of computerized methods is just silly. Discussion about decision rules and decision theory is also appropriate. But it will accomplish nothing to talk in hypotheticals and vague statements, and it will will do no good to neglect to follow the data and studies that are already done. Slinging nonsense accusations about nested IF statements is again just silly. Computers require logic – lots of it. If anyone is too afraid to depend on complex logic and computers – just log off now, burn your cell phone, and move to whatever wilderness remains where you can live a completely natural life raising your own goats that supply all your food, fuel, and clothing. What we have done, without any funding, is to build a completely accountable algorithm that can calculate the probability of error for single issue, multi-facet, and multi-issue exams. We have also developed and documented a manual scoring method (ESS) that can now accomplish these same objectives. We have shown our data, and shown our math. Were the data contradict our opinion, we do the scientific thing and set our egos and opinions aside so we can follow the data. When there is something we need to know but don't we do a study, and learn the answer from data, not opinion. This, I believe, is the way to will the long game of survival and credibility of the polygraph professional among the sciences and courts. What we have also done is to create algorithm that anyone can take home to study and learn – and we have invited criticism, whether accurate or inaccurate. It would be great if all of the polygraph scoring algorithms were similarly transparent. Again, part of the motivation for OSS0-3 was the lack of access to any real information about what algorithms do, and the lack of any opportunity to study the model, learn more, test possible improvements, and engage the collective intelligence of the profession. There is nothing inherently wrong with proprietary interests. But it is my opinion that proprietary and market interests have, at times, restrained the collective intelligence of the polygraph profession. Open-source initiatives have, at times, prompted a level of competition, transparency, and knowledge acquisition that can be very beneficial and healthy for a profession – because they keep a profession from becoming indentured to a single source of technology and knowledge. Progress, r
------------------ "Gentlemen, you can't fight in here. This is the war room." --(Stanley Kubrick/Peter Sellers - Dr. Strangelove, 1964) [This message has been edited by rnelson (edited 08-24-2010).] IP: Logged | |